Conversation
TuomasBorman
left a comment
There was a problem hiding this comment.
Looks good, couple comments
As the binning is based on ranks, shouldn't counts and relabundance lead to same result? |
|
Also discussed over lunch; you could update OMA's ML chapter if binning improves the accuracy |
My concern is with using e.g. CLR transformed values for the binning, which causes unexpected binning. In that sense it feels like a 'safetynet' to add some sort of check to use relabundance or counts. |
|
Ahh, yes. Maybe you could check that the values are positive and give error if not as the result does not make any sense. |
this renaming alings with other functions in the package
|
The standard binning in R is done with function "cut". This is widely used and has many useful arguments. Just thinking whether that should be supported, or in general should multiple binning options be supported and their difference explained in the documentation. But that can be another PR. |
I'll look into using I think testing different binning methods would be beneficial. The quantile binning approach is supported by the BiomeGPT paper, but there are of course other ways like simple equal width bins etc. Seems logical to include in a separate PR to expand functionality if it seems useful to include other binning options. |
|
@raivo-otus any updates? |
|
Added lit ref to the BiomeGPT paper describing the binning strategy, and changed the implementation to utilize |
Adds a quantile based binning transformation to the mia::transformAssay() -function, as discussed in #800
Default value of bin = 4, reflects roughly division to "rare, low, medium, high" which is easy to understand.
Unit tests include checks for both sample- and feature-wise transforms.
Should the transformation default to using "relabundance" assay, or leave choice to user discretion?
Pending tasks:
Potential optimizations;